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e System and method lor efficiently supporting aooesa to I/O devices through large direct^apped 
data caches. 

® A data processing system (10) indudes a CPU (12) connected to a directHnapped cache (14) by address 
^ (16)^5 data (18). The cache (14) Includes a first-leve. cache (20) connected to « sf^n^J-l'v*^*^* 
bv^drwe bus (24) end data bus (26). The second-level cache (22) of the cache (14) is connected to 
Srifus^) Jid?! p5 by aidriss bus (32) and data bus (34). The «id«ss and date busses 28 
S^) are SnSed to memiy (36) and VO device (41) by address bus (40). data bus (42). address bus (44) 
S d«B bus (48). respectively. In t»,e system (10). VO interface (38) decodes P^3'<^ ^f T'L 
r^nds to ««4ssJ In spXdfic ranges using first and second add^sses a«f "jf J^'* ^f^^ 
cdwi m the data cache (14). VO software alternates between the two addresses instead of altemat«ig between 
a device register address and a resewed-regton address as In prior ait systems. 



Ifl 
O 
CO 

<0 
CO 



UJ 




m. 1 



XacK Copy Centre 



EP0 436 305 A2 



SYSTEM AND METHOD FOR EfTICIENTLY SUPPORmNQ ACCESS TO I/O DEVICES THROUGH LARGE 

DIRECT-MAPPED DATA CACHES 



BACKGROUND OF THE INVENTION 

1, Reld of the Invention: 

5 The present invention relates generally to a system and nnethod for Improving the efficiency of 
programmed input/output (PI/0) and polling of Input/output {\fO) interfa»s in a system with large di^ec^ 
mapped data caches. More particularly, tt relates to such a system and method which does not require the 
use of explicit cache management Instructions. Most especially, the Invention relates to such a system and 
method which combines use of direct-mapped caches, a large number of cache fines, high cache miss 

10 penalties reiathre to instruction times, and a lack of direct memory access I/O. 

2. Description of the Prior Art 

There are several ways to execute I/O operations in a computer system. One which Is often attractive is 
IS called "memory-mapped" I/O, where VO device registers appear in the same physical address space as 
main memory, and may thus be accessed via normal load/store instructions. Memory mapped I/O devices 
typically decode physical memory addresses and respond to addresses in apecffic ranges. 

In processors with data caches, one problem with this approach Is that the goal of the cache, which Is 
to suppress references to main memory, conflicts with the goal of instructions used to access the I/O device 
20 registers, which is to cause an I/O access for every load or store instniction. Another way of staling this 
problem Is that sofh»are which is pofiing an I/O device register must guarantee that the polled address is 
not valid in the data cache, or the software will not see the actual register value. 

Typical ways of dealing with this protriem are: 
Non-cached regions of physical address space for i/O device registers; the cache is disabled. 
26 Explicit cache management operations where the I/O software can ask that a particular cache line be 
invalidated, possibly causing a write-back. 

Indirect cache managemerrt Instructions useful with direct-mapped caches, yfMQte the software generates a 
reference to a re^n of the physical address space known to collide with the cache line being "managed." 
thus causing the fine to be invaHdated. This other region can be called a "reserved" region, although it 

30 might be used independently for normal memory. 

Cunrent trend in processor design is changing several system parameters. Cache lines are getting 
larger. Next generation systems may have a 256 byte second-level cache line. This implies the use of write- 
back rather than write-through caches. Memory latencies are getting longer In relation to Instruction rate. 
The cache refiil time on the next generation systems might take as long as 200 Instruction cycles. 

36 These changes affect the perfomnance of traditional means of dealing with the memory mapped I/O 
problem. Using uncached addresses is simple, but because it generates a cache miss for every I/O 
instruction, bandwidth for programmed I/O (Pl/O) data transfer is reduced to a tiny fraction of the memory 
system bandwidth. In the next generation systems, this fractton might be 1/32 of the basic bandwidth. 

Explicit cache management instructions can provkle accurate control over the disposition of cache lines, 

40 but create some additional complexity In the central processing unit (CPU) and cache Implementattons. and 
are not present in all architectures. Implicit cache management suffers from high latencies because, In 
general, it requires a reference to the reserved region for each reference to an I/O register, h thus requires 
two cache misses and refills per I/O reference. One can do bettor for Pl/O data transfer by making the I/O 
device's data buffer register as wide as a cache fine. Then, almost half of the memory system bandwkith is 

45 available for data transfer. The other half is stiil used for refilling from the resented region. It is clear from 
this discussion that improvement is required In the traditional means of dealing with memory mapped I/O 
for use in next generatton computer systems. 

SUUMARY OF THE INVENTION 

so 

A system for access to I/O devices through large direct-mapped caches in accordance with this 
invention has a central processing unit a main memory, at least one input/output device and a direct- 
mapped cache connected between the central processing unit and the main memory and between the 
central processing unit and the at least one input/output device. The at least one Input/output device has at 
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least one register that It Is addressable wHh a first address and a second address, chosen to collide In the 
direct-mapped cache. As used herein, the term -collide" means that the two addresses both rne^^J^ 
same wotxl in a cache. The cache cannot simultaneously contain the contents of tjoth memory locatons at 
the same v«.rd. TTw direct-mapped cache, the main memory and the at least orie inputfoutput ^J^^ 
^ressable by means of addresses having a common fom>. The central Processing unrt \^ff'>f ^ 
Sntrol of an Input/output program to address the at least one inputtoutput device «rtth the fimt and second 
Sd^i^oTto coliki m the dlrect^ped cache In attemating fashion. This allows ea* cache-m.sj 
Sd from a register of the Inputfoutpul device, to convoy useful InforniaBon. white guaranteeing that the 

value storBd in the cache is not "stale". 

A method for access to I/O devices through large direct-mapped caches m ««»rdanoe w.tti th s 
invention includes addressing the direcHnapped cache, the main m«nory and me JTSS 
device with addresses having a common form. The at toast one input/output device t«s « ^ f J 
wW^ir addressed In aHernaUng fashion with first and second addresses chosen to collrie in the d.rect- 

"^^e ^^ent of the foregoing and reteted advantages and features of the ir,va««on should be rme 
readily apparent to those skilled In the art. after revtew of the following more detailed descnption of the 
invention, talwn together with the drawings. In which: 

BREF DESCRIPTION OF THE ORAWINQS 

Rgure 1 is a block diagram of a system In accordance with the invention. 
Figure 2 is a more detaited block diagram of a portion of Ihe system of Figure 1 . 
FigureaisadiagramofmemoryaddresstormatfbrthesyslBmof Hguresi and2. 

Rgure 4 is a flowdiagram showing pracMce of a portion of the method In accordance with the .nvenbon 
"X^rr.'^S^Z:^^"^^ ^ ^ embodiment o, the method of the lnven«on with the 

""TguL^S^tlZ^L chart showing practice of anomer embodiment of the method of the lnventk,r 
with the system of Figures 1 and 2. 

so 

DETAILED DESCfWPTION OP THE INVENTIOM 

Turning now to the drawings, more pardculaily to Figure 1 . there is shown a system 10 which use« 
present inlition. The system 10 includes a CPU 12 connected to a ^rect-mapped «che 1 4 by addr^ 
3s Z 16 and data bus ia The cache 14 includes a flrst-levei cache 20 connected ^ a secoj^^tevel c^l^ 
l,y address bus 24 and data bus 26. Cache 14 couW be imptemented wrth ''J>^^^'' ^"T^^Z 
*an two levels of cache, as well The second-tevel cache 22 of the cache 1 4 'S connected to ^dre^ bus 
28 and data bus 30 by address bus 32 and date bus 34. The address and date busses 28 and 30 are 
conS^ct^ to memory 36 and I/O device 41 by address bus 40, date bus 42. address bus 44 and date bus 
S^cu JyT *e 10. access time delay between the CPU 12 and the cad,e 14 .s much 

aLss ^me delay between the cache 14 and the memo^ 36. 
access or "hit- on the first-tevel cache takes 1 cycte. A flr8t4evel miss and second-level hrt wou^d take 10 
A se«.nd-tevel miss, requiring an acc«« to memory 36. would take 100 to ^OO^c^- T^ese re os 
assume a cycte time on the order of 2 nanoseconds. As the teste '^'^^^J'^'cST if bvls^^^^ 
might increase to. for example. 1:30:1000. A floating point unit 48 Is connected to the CPU by tajs 43^ 

in pracuce. the system 10 is imptemented wHh an -integrated- proces«r. i^e.. «^ "J^^"?^ 
oointunnS a.^ fir^tevei cache 20 are provided together on a singte Integrated circuit chip 2i. Do ng Ws 
rJhiXS^Snotogy^ as eiSttercoupled logic (ECL) circuits or a galum arsenide (6a^> 
"e-SlSur'nSSSS^-it provides an extremely tew cycte time. ^-rrS::;'SZ^ZT^ 
Unfortunately, it is much harder to reduce the latency of access to man memory P«>Portlo'«tely. 1 
Sreseems to be generally tn« that the relative cost of a cache miss in tem>8 of the number of 
SjSnlJSes wasteS is going to get pregressWely woree. in tete 19Ws ^^^'^^^^."^ 
wasted about 1 Instniction time. In current technotogy. a cache miss «a8teeabout 10 Instructoon tmes. 
Within the foreseeable future, cache mteses can be expected to cost 1« »J<^ gPU 
In order to hide some of this cost, the system 10 uses a terge second-tevel cache 22 t^^en ttte OTU 
12 chip and the main memory 36. The access time frem the CPU 12 to tee ^^^^ be^r^^ 
since L second-tevel cache 22 b smaH enough to be built out o»^«P^J2!f ™lTemt?S 
cache lines 50 (Figure 3) can bo made terge enough eo that the effective bandw«Jth from main memory 38 
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is high enough to satisfy the requirements of the CPU 12, provided that the seconcWevel cache 22 is also 
large enough to have a sufficiently high hit rate. 

Studies Indicate that as caches get large, one gets better performance for a given Investment using 
large direct-mapped caches 20 and 22 rather than not-qi^-so-large associative caches. The Increased hit 
5 rate from associative caches is not significant but the Increase in cache access time is significant 

In order to make the caches 20 and 22 fast it is desirable to avoid excess complexity. One source of 
complexity i« that required to maintain coherency between the cache contents and tiie actual vaiue of the 
data. In a multiprocessor system, the multiple caches involved mu«t be kept consistent somehow. However, 
mere are many reasons why one woukJ prefer to buiW a uniprocessor, and In a uniprocessor there Is no 
10 need to maintain consistshcy between the caches of several CPUs. However, it is still necessary to maintain 
cache consistency if the system Includes I/O devices 41 that appear as memory cells, i.e.. memory cells 
whose values can change without being written by the CPU 12 via the cache 14. If this can be done without 
excess hardware complexity, the price and/or performance of the system 1 0 will be improved. 

In the system 10. I/O interface 38 decodes physical memory addresses and re^nds to addresses in 
i5 specWc ranges using at least one register 39. Each 1/0 device register 39 responds to one of two distinct 
physical addresses, which are chosen to collWe in ttie data cache 14. \I0 software alternates between the 
two addresses for the register 39 Instead of alternating between a device register adtiress axl a reserved- 
region address as in prior art systems. T^rs means that every cache 14 refill or write-back operation is 
"useful," in the sense that it references the devtee register 39 In question. This approach works without 
20 changes to the Instruction architecture and is simple to implement and program. Polls (read or write) require 
only one cache-refill latency. Pl/O read transfer proceeds at full memory-system bandwidth. 

Arranging for the I/O interface 38 to respond in this manner Is not dlffteult Since the dlrect-mawed 
cache 1 4 Is a power of two in aze. any pair of addresses that differ in at least one bit numbered higher than 
the base-2 togarilhm of the cache 14 size will colOde. The I/O interlace 38 ignores one such address bit 
ss such as the high order bit of the I/O regkjn of the physical address space, to produce tWs result. If the bit to 
be ignored Is chosen carefully, neither the Interface 38 hardware nor the I/O software need be cognizant of 
the actual cadhe 1 4 size. 

Rgure 2 shows one fonn of hardware to Implement this fomt of addressing. An address recognizer 45 
is connected to the address bus 28 by bus 47. Addresses on the address bus 28 are supplied to a 
30 comparalor 49 where they are compared with a constant value bbxbbbbb representing ttie address of the 
register 39. where the Y bit of the value represents a "don't care" vaiue, so that It la ignored in the 
address. The output of the comparator 49 is connected as a control input to enable the register 39. Thfe 
means that two different addresses on the address bus 28 will select the register 39. so that data on the 
data bus 30 will be supplied via register 39 to the I/O device 41. While a single register 39 is suitable for 
36 the Invention, the register 39 can also be implemented as a large number of registers in the fomn of a buffer 
memory, so that the register 39 can be repBcaled many times for a single I/O device 41. 

Rgures 3 and 4 show memory address fonmat 52 used In the system 10 and how lines 50 of cache 14 
are mapped to mE^ memory 36 In system 10. The memory address formal 52 has a 1 2-Wt tag field 54. a 
12-bit line number fieW 56 and an 8-bit byte offset field 58. As shown, cache fines 50 numbered 0 ttirough 
4095 map to main memory lines 60 numbered 0 through 4095. 4098 through 8,191 . and so fbrth. depending 
on their tag 62. Unes 50 and 60 are 25e-byte lines, requiring the 8-bit byte offset field 58. The 4096-line 
cache 14 requires the 12-bit lino number field 66. This memory address format is representative of practice 
of the invention, but a wide variety of other memory address formats could be used. 

Further understanding of the Invention is provided by considering four cases of I/O operations in the 
system 10: reading a status register, writing a status register, doing a data input Pl/O transfer, and doing a 
data output Pl/O transfer. In the following discussion, examples are given in C programming language code. 
The Invention can. of course, be practiced wifli any suitable programming language. 

Figures 5 and 6 provide baclcground on tiie operation of a direct-mapped write-back cache, necessary 
in understanding how Pl/O operations depend on cache operation. Rgure 5 Is a flow chart shovdng the 
so steps in a read operalfon from the dfroct-mapped cache 14 using wrHe-back. The line number 56 and tag 
54 from address 52 supplied by the CPU 12 are used at step 70 to select ti»e appropriate line 50 from tiie 
cache 14 The tag of the address to be read is compared at 72 wHh the lag 62 found in tiie cache 14. If the 
tags match at 74, the line number 56 from tiw cache 1 4 is used at 76 with the byte offset 56 to select bytes 
from the specific line 50 of the cache 1 4 at 78. If the tags do not match at 74, a test is njn for a "dirty" Bne 
66 50 at 80, A dirty line Is a line in the cache 14 whose value has been modified by a write operation from the 
CPU 12, and the new vakje has not yet been updated In main memory 36. If the line 50 Is dirty, the line is 
written to memory 36 at 82. tf the line 50 is not dirty, or after the line is writlen to memory if (firty, a line 60 
is obtained from memory 36 at 64. The line 60 and tags 62 are stored in the cache 14 at 86. The line 
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number 56 from *ie cache 14 and the byte offset 58 Is then used to select bytes from the line 50 at 78 as 
before. The selected bytes are then returned to the processor 12 at 88. 

When reading a single I/O interface 38 register 39, the I/O software executes the following: 
int 'regaddn 
6 int value; 

value " ragaddr[0]; 

regaddr = XOR(«gaddr.COLUDB«ASK): ^ r . 

where COLUDEIWAS^ Is a bitmask with a Halt where the I/O IntBrface 38 ignores the physical address bft. 
10 and CH)lts elsewhere. The I/O software thus attemates between the two views of the interface 38. and never 

requires an extra cache refill unfil some other activity requres that cache line 50. 

When reading several registers that may safely be read in a single operation, the I/O software executss 

the foRowing: 

valueO = regaddr[0]; 
IS valuel ° regaddr{l]: 

value2 = regaddr[2t 

regaddr = XORfregaddr. COLUDEMASK); ». r cn 

This requires only one cache refill for tfie ttiree reads. » all throe registers map to the same cache bne 50. 
Use of an uncachod address mechanism tiero would require three refills. 

As shown In Rgure 6. a wrHe operation on the direct^apped cache 14 using wrlt»*ack is similar. The 
line number 58 and tags 54 are used at step 90 to select the appropriate line 60 from the cache 14. The lag 
of the address to be written Is compared at 92 with the tag 62 found in the cache 14. If the tags match at 
94 the line 50 is chosen at 96 for data to be written at 95 with the byte offset 58 to insert new bytes inf» 
th^nne 50 at 98. If the tags do not match at 94. a test Is run for a dirty line 50 at 100. If the line 50 Is dirty, 
the line Is written to memory 36 at 102. H the fine 50 is not dirty, or after the line is written to memory if 
dirty a line 60 Is obtained from memory 38 at 104. The line 60 and tags 62 are stored in the cache 14 a 
106. The line number 56 from the cache 1 4 and the byte offset 58 is then used to write data on the One 50 
at 98 and store the Une 60 and tags 62 in the cache 14 as before. 

When the VO software is writing an VO device register 39 in the system 10. it executes the followmg: 
30 regaddrfO] = value; 

regaddr = X0R(reg8ddr. COLUDEMASK); 

Thrseoo'S^eice is necessary to cause the line to be written out of the cache 14. since the ceche 14 
Is not write-through. Note that because the normal caching mechanism is in use. and because the cache 

S5 Hne is wider than the register 39, the cache system wiB insist on reading each cache fine 50 before writino 
it This results In twice the latency as would be necessary with explicit cache management instructons. 
Note also, however, that the next write access to this register 39 will probably proceed without requiring an 
additional refill operation, since the second reference will have resulted in the corresponding address being 
valid in the cache 14. If the next access to this I/O interface 38 might be a read reference, and the conterite 

^ of the device register 39 might change in the interim, if is necessary to alternate the address once aoaln- As 
with the read case. K one can safely write several registers at once that share the same cache line 50. the 
address alternation may be postponed, thus amortizing the overiwad. . ^ ^.^ , 

An example of a Pl«) read data transfer Is reading a buffer from a disic controller. In traditonalPi/O 
designs, the device's data buffer register is one word wide. For best perftjrmance. the present inver*on 

45 uses a buffer register instead that is as wide as a cache line 50. This buffer register wjn be treated as N 
adjacem registers, where N - line sIzaArrord size. To transfer a disk buffer, ona would write: 
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int buf fer[BUFERSlZE] ? 
llnevords « LINESZZE/WORDSIZE; 
5 for (i - 0; i < BUFERSIZE; i +« linewordo) { 

for (j »» 0; j < linewords; { 
buffer t i+ j 1 - regaddr t j ] ? 

) 

regaddr • XOR( regaddr r COLLIDEMASK) ; 

} 

T5 The inner loop requires one cache refill for each instance of the errtire Inner loop (not each Jteration) and 
transfers one line of data from the device to a memory buffer. One should strive to ensure ttiat the buffer 
does not collide with the device register, at least In the second-level cache 22. Since all the cache refills are 
for useful data, the transfer proceeds more or less at full avattable bandwidth. 

A Pl/0 write data transfer, such as writing a disk buffer, assuming a wrtte-beck cache, executes the 

20 following: 

int bufferCBUFERSIZE]; 
linewords - LINESIZE/WORDSIZE; 
for (i = 0? i' <-BUFERSIZB; i += linewords) ( 
for (j « 0? j < linewords; { 
regaddr [ j ] « buffer [ i+ J ] ; 

30 J 

regaddr « XOR(regaddr, C0LLIDEMA8K) ; 

) 

SB dummy • regaddr[0]; 

/* cause "dirty miss" on last line */ 

A dirty miss is a reference to the cache 14 that cannot be satisfied by ttie current contents of the cache, 

40 and the line that will be used to hold the referenced word ones it is fetched from main memory is cunrently 
dirty. That is. the iine contains a value that nrnist be written back to main memory 36 before the line can be 
used to hold the value needed for the current reference. Because the cache system will do a refill on the 
first write to each now cache line 50, writ© transfers done in this way will "waste" about half of the memory 
bandwidth Mr^ useless reads from the device 41 register 39. This Is worse than the perfomnance obt^ned 

46 with expBcit cache management instructions, which have no wasted reads, but better than that obtained with 
the reserved-region method, which has two wasted reads per line written. Also, note that the device's data 
buffer register must be designed so that these cache-filling reads do not cauee trouble. Such trouble can 
arise from the practice in some cases of designing device registers so that a read reference has side- 
effects, i.e.. reading the register causes some action besides simply the return of the value. In some cases, 

60 the side^effects are consequences of poor design decidons. such as when reading from the register causes 
the device to start a physical operation. However, often one natural approach is to use one data register as 
a "window* on an entire buffor. Each time the register is read, rt automatically steps thnwgh the buffer. In 
fact the above examples assume this organization. In the read-transfer case, since we are arranging for 
exactly the right number of reads from the register, and in the right order, there is no problem. In the write> 

6tf transfer case, we are doing exactly the right number of writes, but the cache may also be doing refill reads 
on each of the dirty misses. We must therefore design the write buffer register so that read references, 
unlike write references, do not "step" It through the underiying buffer memory. 

It should be apparent to ttose skilled tn the art that various changes in form and details of Ihe inventkMi 



6 



EP0436 305 A2 



■'J 



M show and described may be made. It is intended that such changes be included within the spirit and 
scope of the claims appended hereto. 



Claims 



TO 



A data ptDcessing system. wWch comprises a central processing unit, a main memory, at least orie 
input/outout device, a direct-mapped cache connected between said central processing unit and sa d 
main memory and between said central processing unit and saW at least one input/output device, said 
at least one Input/output device having at least one register being addressable by a flrat address and a 
second address, the first address and the second address being chosen to colhde In said direct- 
mapped cache, said direct-mapped cache, said main memory and said at least one inpoVoutput dev.ce 
being addressable by means ol addresses having a common form, said central processing unit being 
opei^le under control of an Inputtoutput program to address said at least one Input/output device in 
altemating fashion with the first and second addresses chosen to colflde in said direct-mapped cache. 

■n,e data processing system of Claim 1 In which said dtect-mapped cache comprises a first-level 
cache and a second-level cache. 

The data processing system of Claim 2 In which an access time between said central processing urjit 
and aald first-level cache Is smafl ratelfve to an access time between said second-level cache and said 
main memory. 

The data processing system of Claim 2 additionally comprising a floating point unit connected to said 
central prot»sstng unit 

The data processing system of Claim 4 in which said central processing unH. said floating point ui« 
and said first-level cache are implemented together on a single integrated circuit chip. 

The data processing system of Claim 1 In which ssdd central processing unit is connected to said 
direct-mapped cache by a first address bus and a first data bus and said direct-mapped cache Is 
connected to said main memory and to said at least one input/output device by a second address bus 
and a second data bus. 

7. The data processing system of Claim 1 In whioh"Mld system is operable with memory addresses 
35 having a format including a tag field, a line field and a byte offset field. 

8. The data processing system of Claim 7 in which memory addresses fa said at least one input/output 
device have a bit position ignored by said at least one input/output device and said central processing 
unit is operable with a bitmask to examine the bit position ignored by said at least one input/output 
device to address said at least one input/output device with the first and second addresses in the 
alternating fashion. 

The data processing system of Claim 8 in which said cache and a register of said at least one 
input/output register have equal line sizes. 
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10 The data processing system of Claim 1 in which said at least one Input/output device is connected to 
said direcfmappe?c2he by an add.^ bus and a data bus. said system ^^^^^'f^^^^ 
address lectjnteer connected to said address bus. said address recognizer Including a compactor 
connected to receive addresses on said address bus as a first Input and a value whfch w.11 address the 
at least one register as a second input for comparison with the first Input said cofnparalor bemg 
configured to ignore a bit in the addresses, an output of said comparator being connected as a control 
input to the at least one register, the at least one register being connected to said data bua 

11. ^ date processing system of Claim 1 in which the at least one register of said at toast one 
inputtoutput device is a buffer memory comprising a pluraHt/ of re^sters. 

12. A date processing method which comprises providing a data processing system including a central 
processing unit a main memory, at least one inpuVoulput device, a direct-mapped cache connected 
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between the central processing unit and the main memory and between the central processing unit and 
the at least one inpul/output device, with the at least on© input/output device having ai toast one 
register being addressabte by a first address and a second address chosen to collide In the direct- 
mapped cache, addressing the direct-mapped cache, the main memory and the at least one 
6 input/output device wflh addresses having a common fonm, and addressing the at least one register of 
the at least one Input/output device wfth the first address and the second address in alternating fashion 
with the first and second addresses chosen to collide In the directHmapped cache. 

13, The data processing method of Claim 12 In which the direct-mapped cache Is provided with a flrst-leve! 
TO cache and a second-level cache, 

14. The data processing mrthod of Claim 13 in which an access time between the central processing unit 
and the firaWevel cache is smafl relative to an access time between the second-level cache and the 
main memory. 

16. The data processing method of Claim 12 In which the cache, the main memory and the at least one 
register of the at least one input^output device are addressed with memory addresses having a format 
including a tag field, a line field and a byte offset field. 

£0 16. The data processing method of Claim 16 in which memory addresses for the at least one register of 
the at least one input/output device have a bit position Ignored by the at least one register of the at 
least one inpuVoutput device, the method further comprising the step of examining the memory 
addresses with a bitmask to examine the bit position ignored by the at least one register of the at least 
one input/output device to address the at least one register of the at teast one input/output device with 

25 the first and second addresses in the alternating fashion. 

17. The data processing method of Claim 16 In which the cache and the at least one register of the at least 
one input/output device are provided with equal line sizes. 

$0 18. The data processing method of Claim 12 in which the at least one register ol the at teast one 
input/output device Is addressed with the first address and the second address by comparing 
addresses with a value representing the address of the at least one register, ignoring one bit in the first 
address and the second address while comparing addresses, and enabling the at least one register 
when the value and the addresses match. 

35 

19. Ttie data processing method of Claim 12 In which the at least one register Is a buffer memory 
comprising a plurality of registers. 
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